Web based classification of Tamil documents using ABPA

نویسنده

  • S. Kanimozhi
چکیده

Automatic text classification based on vector space model (VSM), artificial neural networks (ANN), Knearest neighbor (KNN), N aives Bayes (NB) and support vector machine (SVM) have been applied on English language documents, and gained popularity among text mining and information retrieval (IR) researchers. This paper proposes the application of ANN for the classification of Tamil language documents. Tamil is morphologically rich Dravidian classical language. The development of internet led to an exponential increase in the amount of electronic documents not only in English but also other regional languages. The automatic classification of Tamil documents has not been explored in detail so far. In this paper, corpus is used to construct and test the ANN model. Methods of document representation, assigning weights that reflect the importance of each term are discussed. In a traditional word matching based categorization system, the most popular document representation is VSM. This method needs a high dimensional space to represent the documents. The ANN classifier requires smaller number of features. The experimental results show that ANN model achieves 93.33% using Back Propagation Algorithm (BPA) which is better than the performance of VSM which yields 90.33% on Tamil document classification. In this paper, our goal is to increase the percentage as 94.33% using Advanced Back Propagation Algorithm (ABPA).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Arabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents

Besides for its own merits, text classification (TC) has become a cornerstone in many applications. Work presented here is part of and a pre-requisite for a project we have overtaken to create a corpus for the Arabic text process. It is an attempt to create modules automatically that would help speed up the process of classification for any text categorization task. It also serves as a tool for...

متن کامل

Tamil Search Engine

The web creates new challenges for information retrieval. The amount of information on the web, as well as the number of new users is growing rapidly. Search engine technology has to scale dramatically to keep up with the growth of the web. In this paper, we present the details of constructing and maintaining a Tamil Search Engine. We discuss the issues such as the crawler, the database storage...

متن کامل

RRLUFF: Ranking function based on Reinforcement Learning using User Feedback and Web Document Features

Principal aim of a search engine is to provide the sorted results according to user’s requirements. To achieve this aim, it employs ranking methods to rank the web documents based on their significance and relevance to user query. The novelty of this paper is to provide user feedback-based ranking algorithm using reinforcement learning. The proposed algorithm is called RRLUFF, in which the rank...

متن کامل

Supervised Methods for Domain Classification of Tamil Documents

The Era of digitization induces the need of domainclassification in both the on-line and off-line applications. The necessity of automatic text classification arises for utilizing it in diverse fields. Hence various methodologies like Machine Learningalgorithms were proposed to do the same. Here automatic document classification of Tamil documents have been proposed by considering the exponenti...

متن کامل

An Alternate Method of Classifying Allergic Bronchopulmonary Aspergillosis Based on High-Attenuation Mucus

BACKGROUND AND AIM Allergic bronchopulmonary aspergillosis (ABPA) is classified radiologically based on the findings of central bronchiectasis (CB) and other radiologic features (ORF). However, the long-term clinical significance of these classifications remains unknown. We hypothesized that the immunological activity and outcomes of ABPA could be predicted on HRCT chest finding of high-attenua...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012